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Abstract 

We study the problem of achieving average consensus between a group of agents over a network with erasure links. 
In the context of consensus problems, the unreliability of communication links between nodes has been traditionally 
modeled by allowing the underlying graph to vary with time. In other words, depending on the realization of the link 
erasures, the underlying graph at each time instant is assumed to be a subgraph of the original graph. Implicit in this 
model is the assumption that the erasures are symmetric: if at time t the packet from node i to node j is dropped, the 
same is true for the packet transmitted from node j to node i. However, in practical wireless communication systems 
this assumption is unreasonable and, due to the lack of symmetry, standard averaging protocols cannot guarantee that 
the network will reach consensus to the true average. In this paper we explore the use of channel coding to improve 
the performance of consensus algorithms. For symmetric erasures, we show that, for certain ranges of the system 
parameters, repetition codes can speed up the convergence rate. For asymmetric erasures we show that tree codes 
(which have recently been designed for erasure channels) can be used to simulate the performance of the original 
"unerased" graph. Thus, unlike conventional consensus methods, we can guarantee convergence to the average in the 
asymmetric case. The price is a slowdown in the convergence rate, relative to the unerased network, which is still often 
faster than the convergence rate of conventional consensus algorithms over noisy links. 

I. Introduction 

In a network of agents, consensus refers to the process of achieving agreement between the agents in a distributed 
manner. Consensus problems, and in particular the problem of reaching consensus on the average of the values of the 
agents, have been around for a while and are often used to serve as a test case for studying distributed computation 
and decision making between a group of nodes/processors/dynamical systems ( [l]-[6]). Most of the work in this 
area assumes that the agents are connected via a fixed underlying graph or network. In many applications, however, 
the links in the underlying graph are noisy or unreliable. In the context of consensus problems, the unreliability of 
communication links between nodes has been traditionally modeled by allowing the underlying graph to vary with time. 
In other words, at each time instant some of the links are allowed to be erased, and depending on the realization of the 



link erasures, the underlying graph at each time instant is assumed to be a subgraph of the original graph. Furthermore, 
the distributed algorithm for reaching consensus remains unchanged: the same distributed averaging algorithm is used, 
except that only the information received at each time is used. An important assumption that is implicitly made in 
this model is that the erasures are symmetric: if at time t the packet from node i to node j is dropped, the same is 
true for the packet transmitted from node j to node i. In practical wireless communication systems this assumption 
is patently unreasonable: the additive noise at the two nodes are independent and, furthermore, communication in 
the two directions occurs at either different times or over different frequency bands. If standard averaging protocols 
are performed, this loss of symmetry can prohibit the network from reaching consensus to the true average (standard 
consensus protocols require that the "update" matrix be doubly stochastic, something that cannot be guaranteed in the 
asymmetric case). 

The goal of this paper is to explore the use of channel coding to improve the performance of consensus algorithms, 
especially in the asymmetric case. A major impetus for this work is the recently designed tree codes for erasure 
channels [7], which, as we demonstrate, resolves the problem encountered in the asymmetric case. 

For asymmetric erasures we show that tree codes can be used to simulate the performance of the original unerased 
graph. Thus, unlike conventional consensus methods, we can guarantee convergence to the average in the asymmetric 
case. As expected, the price is a slowdown in the convergence rate, relative to the convergence rate of the unerased 
network. Nonetheless, the slowdown is still often faster than the convergence rate of conventional consensus algorithms 
over erasure links. 

II. Problem Setup 

Consider a group of N nodes denoted by M = {1,2,..., N}. We assume that the nodes are connected by an 
undirected communication graph Q = (N,£) which is often referred to as the interaction graph. Throughout the 
analysis Q is assumed to time invariant. Let A = [a%j] denote the adjacency matrix of G, i.e., aij = 1 if 6 £ and 
otherwise. Note that an = 0. Let x l denote the initial value at node i. The objective is for the nodes to compute 
the global average r = j^l T xo, where 1 denotes an iV-dimensional column of ones and xq is the column vector of 
the Xq's. We model the communication links between nodes as packet erasure links. Further, we ignore quantization 
effects due to packetization. The standard packet sizes in practice justify this assumption. We denote the event of 
successful packet reception from node j to node i at time k with the Bernoulli random variable X % u , i.e., XV = 1 if 
the packet is received successfully at time k and otherwise. This notation is summarized in Table [I] 
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III. Background 

For a fixed communication graph Q, a typical algorithm to achieve consensus is of the following form. 

4+1 = wuxl + W iJ X k (!) 
j 

W obeys the underlying graph, i.e., for i / j, Wij = if ^ £ . In other words, each node updates its value by 
taking a weighted sum of its own previous value with those of its neighbors. In short, the equation can be written as 

Xk+i = Wx k (2) 

Such an algorithm is said to achieve consensus if 

Mmsj = r4i-JV (3) 
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In such a static setup where the weights and the underlying interaction graph does not change with time, it is well 
known that consensus is achieved if and only if 

lim W k = -J-11 T (4) 

Further Q holds if and only if the following conditions hold (e.g., see [8]) 

1) W is doubly stochastic, i.e., 

1 T W = 1 T , Wl = l (5) 

2) p(W-jjll T ) <1 

Note that Xk = W k xo. Under the above conditions, Xk — > jfll T x = rl. The convergence rate, p(W), of the above 
consensus algorithm is formally defined as 



fJ>(W) = sup lim 



\x k - rll 



(6) 



x ^irl k— >oo \\%o — 

and is given by p(W) = p (W - jfll T ). There is a considerable amount of work that explores different choices of 
W and how it affects the rate of convergence of the consensus algorithm (e.g., [8]). 

For the purpose of this paper and for ease of exposition, we use a specific but natural choice of W (e.g., [1]) given 
by W = I — eC, where C is the Laplacian of the interaction graph Q, i.e., C = D — A. D = diag{Aj} where Aj is 
the degree of node i. Let = \n(£) < ^N-i(£) < • • • < denote the eigen values of C. The multiplicity of 

the zero eigen value is the number of connected components in the graph and Ajy_i(£) > if and only if the graph 
is connected. 

For such a choice of W, the spectral radius is given by p(W - j r ll T ) = max{l - eAAr_i(£), eA x (£) - 1}. We 
state this as a Lemma for later reference. 

Lemma 3.1: The convergence rate, p, of ([TJ) with W = I — eC is 

p = max{l — eAAr_i(£), eAi(£) — 1} (7) 
So, the conditions 1) and 2) above are satisfied if and only if e < \i{c) • Furthermore, the convergence rate p is 
maximized when the two quantities in ^ coincide, i.e., when 

e = 6 * = AiOCJ + Ajv-iOC) (8) 

In particular, any e < 1/A will work where A = maxj Aj. We remark that the techniques presented in the paper 
are independent of the choice of the weight matrix W. Whenever we wish to write closed form expressions for the 



convergence rates, we use the specific choice W = I — e*C for simplicity. 



IV. Communication Model 

In practice, the communication links between nodes can be unreliable. Conventionally, this has been taken into 
account by allowing the interaction topology to change with time. So, at time k, the connectivity between nodes 
is described by the graph Qk where Q\. can now vary with time. There is a considerable amount of literature on 
the problem of achieving consensus under such time varying interaction topologies ( [2], [6], [9]— [ 11]). We model 
unreliable communication as packet erasures. So, at each time k, the packet transmitted from node i to, say, node j 
is either received {X J k % = 1) or erased {X k = 0). Similary, the packet sent from node j to node i is either received 
(X k = 1) or erased (X k = 0). We consider two erasure models 

1) Symmetric: X k = X k , and X k , X k are independent of each other whenever ^ {(m,£), (l,m)} 

2) Asymmetric: X k , X™ are independent of each other whenever / (m,£), in particular X k and X k are 
independent. 

The literature on consensus over time varying topologies only captures the symmetric case. Even though, consensus 
under very general conditions has been established, not much appears to be available by way of the rate of convergence. 
Under the asymmetric erasure model, the resulting interaction graph is effectively directed. An edge between node 
i and j is replaced by a pair of directed edges. The effective graph at any time depends on the packets that were 
erasured in that round. Under this setup, we define the adjacency matrix A = [a^] and the Laplacian C as follows; 
aij = 1 if (i 4- j) G £ and C = D — A with D = diag{Aj} and Aj = Ylj a ij- The resulting adjacency matrix and the 
Laplacian are not symmetric in general. As a result, they are not doubly stochastic either, i.e., 1 T C / 1 T . When the 
graph Q is directed, (Olfati-Saber Murray 2007) prove that average consensus is achieved using a fixed W = I — eC 
if and only if the interaction graph Q is balanced, i.e., the in-degree of each node is equal to its out degree (cite 
Olfati-Saber Murray 2007). But when the link failures are random, the resulting interaction graph will generally not 
be balanced at every time step. But with coding, one can overcome this problem as we will show later. 

V. Does Coding Help? 

It turns out coding does help. In fact, to study the effect of coding we need to distinguish between the symmetric 
and asymmetric erasure models. When the erasures are symmetric, i.e., when X l k 3 = X k , this means that node % 
(respectively, node j) knows what node j (respectively, i) has received. For example, if node i successfully received 
a packet from node j, it knows that node j also successfully received the packet intended for it; alternately if node i 
receives an erasure from node j, it knows that the packet intended for node j was also erased. In this case, the links 
between the different nodes are erasure links with feedback (where the transmitter knows what the receiver receives). 



For erasure links with feedback it is well known that the optimal coding scheme is retransmission, i.e., the transmitter 
retransmits its packet until it is received at the receiver. 

When the erasures are not symmetric, one needs a more sophisticated coding scheme (called tree codes). We shall 
furher explain this below. 

When there are erasures and when there is no coding, an iteration of the consensus algorithm at node i is given by 

x k+l = x k ~ e ai 3^k ( x k ~ x k) (9) 
3 

The effective adjacency matrix at time k is then A k = A o X k , where X k = [XV\. The associated Laplacian is 
C k = D k - A k where D\ = £\ Aj' = £V ay*?'- 

1) Symmetric Erasures: In this case, note that even without coding, the nodes achieve average consensus albeit 
at a slower rate depending on the erasure probability, say p. We show that coding (in this case retransmitting untill 
sucessful reception) results in faster convergence whenever there exists a constant R' > such that 

D(l-R',p) >log(A + l) (10) 

^ K < VW) (ii) 



where fi is as in Q, H(.) is the binary entropy function and T is defined in Lemma 8.1 



2) Asymmetric Erasures: Since X^ and Xl l are independent, they are not equal in general. Note that C k l = 1 



but l T £ k 7^ 1 T hi general which violates Q. Furthermore, the associated graph is not balanced either, ^ • aijX^ ^ 
2~2i ajiX 3 k l , in general. In this case, the nodes will not achieve average consensus. But under very mild conditions, 
it is well known that the nodes achieve an agreement, i.e., x k — > Yl where Y is a random variable that does not 
necessatily concentrate around the initial average r. But tree codes allow us to simulate the original recursions, i.e., 
([T]), and hence guarantee asymptotic average consensus. Before proceeding further, we provide a brief introduction to 
tree codes. 



VI. Background on tree codes 

The problem of achieving consensus over erasure channels is an instance of the problem of simulating interactive 
communication protocols between a network of agents over unreliable links. In the specific case of consensus, the 
interactive communication protocol amounts to executing ([T} at every node. In this context, Rajagopalan et al in 
[12] use tree codes to simulate such protocols with exponentially vanishing probability of error in the length of the 
protocol (e.g., the length of the protocol is said to be m if one needs to execute m iterations of ([T])). Another very 
important instance of such interactive communication problems is one of stabilizing unstable dynamical systems over 



noisy communication channels (cite Sahai here). Even though the central role of tree codes in such problems has been 
identified, there have been no practical constructions until very recently. In [7], [13], the authors proposed an explicit 
ensemble of linear tree codes with efficient decoding for the erasure channel. Equipped with this construction of tree 
codes, we can examine more closely how they can be used for specific problems such as consensus over erasure links 
which is what we do here. Before proceeding further, we will digress a little bit to outline the codes proposed in [13] 
and list their relevant properties. 

A. Linear time-invariant tree codes 

A tree code is essentially a semi-infinite causal encoding scheme which has a certain 'Hamming distance' -like 
property. When decoding using maximum likelihood decoding over a discrete memoryless channel (DMC), such a 
tree code guarantees exponentially small error probability with delay. In other words, the probability of incorrectly 
decoding a symbol (or paket) d time steps in teh past decays exponentially in d. If the rate of the code is R < 1, 
such a causal encoding/decoding scheme with such an exponentially decaying probability of error (exponent (3 say) 
is said to be (R, (3)— anytime reliable. We will make this more precise below. We will describe the tree codes of (our 
work) in terms of their anytime reliability rather than in terms of their distance properties, because ultimately it is the 
exponent and rate that matter when communicating over DMCs. Since communication is packetized, let A denote the 
packet length. Each packet can be viewed as a symbol from F^. Suppose information is generated at the rate of nR 
packets per time instant at the encoder. Then a rate R time-invariant causal linear code is given by 

c t = dh + G 2 b t -i + . . . + G t bi, t>l (12) 

where a 6 F^ A , h £ F"!]; RA and Gi G p™ Axni?A s 0> a t each time, the encoder receives nR packets and transmits n 
packets. Note that this is essentially a convolutional code with infinite memory. The decoder, at each time t, generates 
estimates b T \ t for 1 < r < t where b T \ t denotes the decoder's estimate of b T using the channel outputs received till 
time t. 



Definition 1 (Anytime Reliability): A causal code as in ([12]) is said to be (R, /?)— anytime reliable if 



P (k\t + b T ) < 2-^ T+1 ) , V r, t > d (13) 



for some fixed d Q independent of t, t. 

Let p' = p l l A . In [13], the authors showed that if the entries of Gi are drawn i.i.d Bernoulli (1/2), then almost every 
code in this ensemble is (R, (3)— anytime reliable for R < 1 — p' and f3 < nAE(R), where E(R) is an exponent that 
depends on the DMC and that can be explicitly computed. For the packet erasure channel with erasure probability p, 



E(R) is given by (see [13]) 



E(R) = < 



1 - log(l +p')-R, n<R< 72 (14) 



where 



7l = 1 - H (j^-f) , 72 = (15) 
\ 1 + p J l+p 

For the rest of the analysis, we will assume that we are given an (R, f})— anytime reliable code with d Q = 0. 

VII. Main Results 
We present the results separately for the case of symmetric and asymmetric erasures. 

A. Symmetric Link Failures 

Note that the underlying interaction graph Q is fixed while each link is modeled as a packet erasure channel. The 
graph Q is assumed to be connected and the links are undirected. If all agents know that link failures are symmetric, 
then each link is effectively a packet erasure channel with feedback. In each communication round, node i would 
know that its packet transmission to node j is erased if it receives an erasure from node j in the same round. Recall 
that the consensus algorithm in the case where there are no erasures is given by 

x k +i = (I - eC)x k (16) 

In particular, node i performs the algorithm 

x k+l = x k ~ e a *i( X fe — X k) (17) 
3 

We now define the communication protocol. 

1 ) The Protocol: A communication round is defined as one in which every node in the graph transmits one packet 
to each of its neighbors. The nodes are said to have completed m iterations if all of them successfully computed m 



iterations of ( [T7j ). Note that this will in general take more than m communication rounds. Since each link is effectively 
an erasure channel with feedback, the optimal communication scheme at each node is to retransmit until successful 
reception. We describe this more precisely as follows. Let e denote an erasure. For each edge j — > i, we associate an 
input queue, Qf n , and an output queue, Q^ ut . Q^ n t contains the packets transmitted by node j to node i up to and 
including communication round t while Q^ ut t contains the packets received by node i from node j. 



r~i I I I I Y~ 
^out,5 • x x l e e x 2 
r\i,2 111 
Qout,5 : W x\ 

Qin,5 '■ X X l W X 2 x% 2 

Qin,5 '■ X I X l I X l I 4 | W 

past < > present 

Fig. 1. Consider an instance of the queues at node i. Suppose its only neightbors are nodes 1 and 2. In round 2, node i receives an erasure 
from node 2 and infers that its own transmission to node 2 must also have been erased. As a result, node i re-transmits x\ to node 2 in round 
3. Similarly in round 3, node i knows that its transmission to node 1 was erased. Since the erased symbol was only a 'wait', node i does not 
re-transmit it in round 4. Instead, it checks if it can perform another iteration of (17) . In this case, it can and hence transmits the new data x\ 
to node 1. In round 5, node i does not have any new data to transmit to node 2 and hence transmits a 'wait'. 




Also let b\ ] denote the packet transmitted by node j to node i in communication round t and let z\° denote the 
received packet. Then 

I bl J w.o 1 — v 

(18) 

Now if z 3 t l = e, then node j infers that b^ 3 was erased and hence retransmits it in the next communication round unless 
b l t J was a 'wait' symbol which we describe as follows. We say that a node i has 'new data' if it could compute one 
or more new iterations of ( [T7] ). During communication rounds where node j does not have any new data to transmit, 
it transmits a wait symbol which we denote with w. The transmission from node i to node j in round t is described 
in Algorithm [T] Let J\f% denote the neighbors of node i, i.e., Mi = {i'l^if = !}• 

Algorithm 1 Node i's transmission to node j in round t 



9: 
10: 
11 



if zf_ x = e and bf_ x j^w then 

frf = tit-i, i.e., re-transmit 
else 

For each j' £ Mi, let £ t j> = max{f | x\, G Q^tt] 
Compute l t = minj'gjv'. £ t ,j' 
if £ t = £ t _ x + l then 

Compute x\ +1 using (17i and set bf = x\ t+1 (note that £ t < £t-i + 1) 



else 



i.e., £ t = 4_a, set bi 



w 



end if 
end if 



The algorithm is illustrated through an example in Fig [T] Using such an algorithm, we have the following bounds 
on the convergence rate of average consensus. 

Theorem 7.1: Let Pm,r> denote the probability that the network requires more than M communication rounds 
to compute MR' iterations of ( fTT] ). Further suppose that the packet erasure probability is p and that erasures are 



symmetric. Then 



P MR , < N2 -M(V(i-R', P )-lag(A+i)) 



(19) 



In particular, whenever R' satisfies 



D(l-R',p) >log(A + l) 



(20) 



Pm,R' decays exponentially fast in M. Recall that N is the number of nodes and A the maximum degree. 

Proof: See Appendix [C] 
Using Theorem 7.1 we can determine the convergence rate Algorithm [T] }i s c , and it is given by 



(21) 



where R' is the largest rate such that pO] ) is satisfied and fj, is defined in Q. The superscript and subscript in n s c denote 
that it is the convergence rate with coding under symmetric erasures. We will compare this with the convergence rate 



without coding in Section VIII Let 



R(p) = sup{R' | D(l - R',p) > log(A + 1)} 
R'>0 



(22) 



Then it is easy to see that R(p) > if and only if p < 1/(1 + A). This means that the proof technique used here 
does not allow us to prove average consensus if the erasure proability is larger than 1/(1 + A). We can demonstrate 
how to overcome this. In fact, one can show that average consensus will be acheived for all < p < 1, we will state 
the result as follows. 

Theorem 7.2: Let Pm R 1 denote the probability that the network requires more than M communication rounds 



to compute MR' iterations of ( [T7] ). Further suppose that the packet erasure probability is p and that erasures are 
symmetric. Then 



p M)RI < N2 - MD (R'^-py £i ) 



(23) 



In particular, whenever R' satisfies 



R' < (1 -p) 



(24) 



Pm,R' decays exponentially fast in M. Recall that N is the number of nodes and \£\ is the number of edges in the 
network. 

Proof: See Appendix [E] ■ 



Combining Theorems |7.1| and 



7.2 



we conclude that the the convergence rate of Algorithm [T] p, s c , is given by 



(25) 



B. Asymmetric Link Failures and Tree Codes 

Now suppose packet erasures are not symmetric. Since information at each node is generated one packet at a time 
and since the unit of communication is a packet, the rate of the code is R = Here, one round of communication 
corresponds to every pair of neighbors exchanging n packets each. Then in any communication round, node i does 
not known which of the n transmitted packets have been received by each of its neighbors. In this case, we use the 



anytime reliable codes described in Section VI-A 



1) The protocol: Consider the pair of nodes i,j and let b 3 t l denote the t th information packet destined to node j 
from node i. Then the data actually transmitted by node i is given by 

i 

i'=l 

Since the code is (R, (3)— anytime reliable, we have P{b^ f / b^) < 2~P( i ~ i '\ Since the channel is an erasure channel, 
the maximum likelihood decoder amounts to solving linear equations. This can be done recursively and efficiently as 
shown in (our paper). Whenever the equations admit a unique solution to some of the variables, those variables are 
correctly decoded. We leave the remaining variables as erasures and do not venture a guess about their value. As a 
result, the decoder always knows whenever it decodes something correctly. 

Like in the case of repetition coding for symmetric erasures, for each link j — > i, we associate two queues Q % ? n t and 
Q l o U t t although with a slightly different meaning. The queue Q l / n t contains all the information packets transmitted by 
node j to node i till round t. In other words, Q l / nt = {br} T <t- On the other hand, Qo Utt are node i's estimates of 
the information packets transmitted by node j so far, i.e., Q 1 ^ t = {b^ t } T <t- Also, it will be evident from Algorithm 
§that 0^ = 0* for all A'. 

With this setup, the mechanics of the protocol is very simple and is outlined in Algorithm [2] 
We can now compute the convergence rate of average consensus achieved by the above algorithm and we state it 
as the following Theorem. 

Theorem 7.3: Let Pm,R' denote the probability that the network requires more than M communication rounds 



to compute MR' iterations of (17 1. Further suppose that the packet erasure probability is p and that erasures are 

'This kind of rate is because we are quantizing each number x\ to fit into one packet. One can instead quantize it more finely into multiple 
packets, say k, in which case R — k/n 



Algorithm 2 Node i's transmission to its neighbors in round t 



For each j' G Mi, compute £ t ,j' = max.{£' | x\, G Q^tt} anc ^ ^ et ^ = mm i'eM 
Also compute mtji = max{m' | x l m , G Q\nt\ an ^ m * = mm j , eA/'i m t,i' 
if £ t + 1 > m t -i then 

Compute a;^ _ +1 using ( 17 1 and set fr^ = _ +1 for all j G A/i 
else 



set 6^* 
end if 



w 



for all j G A/i 



asymmetric. Suppose each node uses a (R, /3)— anytime reliable code. Then 



P MR , < N2 -M((l-R')l-H(R>)-log(A+l)) 



In particular, whenever R' satisfies 



[1 - R')f3/2 > H(R') + log(A + 1) 



Pm,R' decays exponentially fast in M. 

Proof: See Appendix [D] 
As in the symmetric case, the convergence rate, //", using tree codes is given by 



where R' is the largest rate such that (28 1 is satisfied and \i is as in ([7]). Let 



R(P) = sup {R! I (1 - R')(3/2 > H{R') + log(A + 1)} 

R'>0 



Then much like in Section VII-A it is easy to see that R(f3) > if and only if j3 > 21og(l + A). 



(27) 



(28) 



(29) 



(30) 



VIII. Discussion - Coding Vs No Coding 
When there is no coding, the consensus recursion is given by ([9]). We begin with the case of symmetric erasures. 



A. Symmetric Erasures 

The convergence rate of (|9]) when erasures are symmetric is given by the following Lemma 

Lemma 8.1 (Symmetric Erasures): When the erasures are symmetric and i.i.d over time and space, the convergence 
rate of (|9]l, which we define as 

"E\\xk - rl\\ 2 ' 



/i§ = sup lim 



(31) 



is given by 

i4 = VH^s) (32) 

where F s = E(I— eCo)®(I — eCo) is a deterministic matrix that is a function of e,p,C and can be computed explicitly 
in closed form. The subscript c indicates that there is no coding and the subscript s in T s is because the erasures are 
symmetric 

Proof: See Appendix [A] ■ 
Consider the case of coding in the presence of symmetric erasures. From Theorem 7.1 and ((8), it is easy to see 



that the convergence rate is given by in ( |2T| ). So, whenever p s c < coding offers an advantage. We state this as 
a Theorem 

Theorem 8.2: In the case of symmetric erasures, coding offers a faster convergence than ([9]) whenever there is a 
R' > such that 

(l-#)log- > log(A + 1) + (33a) 
P 

R' 

(33b) 



(p(I-e£-±U T )) <^W7) 



B. Asymmetric Erasures 

As mentioned in Section [V| when link failures are asymmetric, the algorithm of ([9]) does not achieve average 
consensus. Nevertheless the nodes reach agreement and the rate of convergence to agreement has been characterized 
in [14]. Here, we characterize the mean squared error of the state from average consensus. 

Lemma 8.3 (Asymmetric Erasures): When the erasures are asymmetric and i.i.d over time and space, we have 

E\\x k - rl\\ 2 = (x - rl) T <g> (x Q - rl) T T k a vec{I) (34) 
Here / is an N x N identity matrix and 

T a = E( J - eCl) ® (J - eCl) (35) 

where T a is a deterministic matrix that is a function of e,p,C and can be computed explicitly in closed form. 
Furthermore p(T a ) = 1. 

Proof: See Appendix IB) ■ 



Note that l T r a = l 1 but r a l ^ 1. Let c, ||c|| = 1 be the right eigen vector of F a corresponding to eigen value 1, 



i.e., r a c = a Then, it is easy to see that lim^oo T k = jrcl T . Using this in (|34|, we get 



lim E|| x k — rl|| = (x a — rl) ®(x — rl) c (36) 

fe— >OD 

This proves that one cannot achieve average consensus without coding when link failures are asymmetric. So, a major 
benefit of using tree codes in such cases is to guarantee average consensus. Furthermore, tree codes can be used to 
implement any distributed protocol over a network with erasure links. 

Appendix 



A. Proof of Lemma 8.1 



Note that C k l = whether or not the erasures are symmetric. Recall that r = -^l T xo- 

Xk-rl = (/- e£ fc _i)(x fe _i - rl) (37a) 

Xk — rl = (37b) 

(/ - e£ fe _i)(/ - e£ fc - 2 ) ..-(/- eCo)(x - rl) (37c) 

V v ' 

E\\x k - rlf = (x - rl) T EY^Y k (x - rl) 

= (x - rl) T (8) (x - rl) T vec(P k ) (38) 

where P k = KY k T Y k . Recall that the erasure process is independent over time and across links. Then we have 

P fc = E(I-e4)P fc _i(I-e£ ) < 39a ) 

vec(P k ) = T s vec(Pk-\), where (39b) 

T s = E(J - eCl) ® (I - eCl) (39c) 

Since erasures are symmetric, £q = Cq. Furthermore, we have vec(P k ) = T k s vec(I), where / is an iV x N identity 



matrix. Putting ( |38) and ( |39| ) together, we get 

E||x fe - rl|| 2 = (x - rlf <g> (x - rl) T r^ec(I) (40) 

So, the rate of convergence of the consensus algorithm in the absence of coding is clearly determined by T s . Observe 
that T s is doubly stochastic, i.e., 1 T T S = 1 T and T s l = 1. It has one eigen value at 1 and all others are strickly 



smaller than 1 in magnitude. Let A2(T S ) denote the second largest eigen value in magnitude. Then clearly 

£^ r ' = ^ llT <41) 



and the rate of convergence is given by 



\A 2 (r s ) (42) 



B. Proof of Lemma 8.3 



Except the claim p(T a ) = 1, everything else follows from Appendix |a| Since T a = E(J — cCq) <S> (I — eC^), the 
claim p(r a ) = 1 follows if p(I — eCo) = 1 which is what we show. Recall that the random variable Xq is defined as 
Xq = if the link j — >■ z is erased at time and Xq = 1 otherwise. For brevity, we will write X ,J instead of Xq . 
Then it is easy to verify that one can write Cq as follows 

C = Y,a ij X^e i (e i -e 1 ) T (43) 

where ej is the i th unit vector. In particular, the underlying Laplacian in the absence of any erasures can be written 

as £ = aijei(ei — ej) T . For any x G M. N , we have 



x 1 (I - e£ )x = x 1 [I--{£ +£ 1 ))x 



I l|2 6 

M — - 



J^aijX^ixi-Xj) 2 < \\x\\ 2 (44) 



Furthermore, 



^ 2 ^ / QijjX'^ (yXj Xj) J> ||x|| ^ ^ ] ^ij (.Xj Xj~) 



= x 1 (I - eC)x > -\\x\\ z (45) 

The last inequality follows from the fact that p(I — eC) = 1. Combining ( |44] > and (j45j), we have \x T (I — eCo)x\ < ||x|| 2 
for all x 6 R which implies that p(I — eCq) < 1. But CqI = 1, so /)(/ — eCo) = 1- Therefore p(T a ) = 1. This 
completes the proof. 



C. Proof of Theorem 7. 1 



We will begin by identifying the state of the protocol in Algorithm [T] For the sake of clarity, we will refer to nodes 
using letters u, v, etc., instead of Recall that J\f v denotes the set of neighbors of v. For each node v at time t 
(i.e., after round t), we associate \M V \ variables {n vu (t)} u& ^, where n vu (t) denotes the latest iterate of node u that 



is available to node v at time t. In other words, n vu (t) is the largest integer r such that x" is available to node v. We 
further define 



n v (t) = 1 + min n vu (t) (46) 



Note that n v (t) is the latest iteration of ( [17] ) that node v can compute at time t. In other words, node v has computed 



{x^} r<nu ( t ) and no more. With this setup, it is clear that Algorithm [I] would have executed mm v n v (t) iterations of 
( fTTj ) till time t. Note that the rate of the protocol is then given by R = lim^oo mw " n »W ; w hich is a random variable 
for a specific run of the protocol. We now state the evolution of n vu (t) as a Lemma below. 

Lemma 1.1: Let X™ = 1 if the edge (v,u) is erased in round t and otherwise. Then the evolution of n vu (t) is 
given by the following equation 

n vu (t + 1) = n vu {t) + X^ 1 l K(t)>n _ (t)] (47) 
Proof: The proof follows from the following simple observations 

1) n vu {t) increases by atmost 1 in each step 

2) In any round, if node u receives an erasure on a link, it will infer that its transmission on that link was also 
erased. As a result, node u has knowledge of n vu (t) at all times t 

3) In round t + 1, if either the edge (v, u) is erased or node u sends a w to node v, then n vu (t + 1) = n vu (t) 

4) Node u sends a 'wait' w to node v in round t + 1 if and only if n vu (t) = n u (t). 

* 

We say that round t got wasted at node v if n v (t — 1) = n v (t), i.e., node v could not perform a new iteration of 



( [T7j ) at time t. The proof idea is as follows: for each node v at time t, we will argue that there exists a sequence of t 
edges of which at least t — n v (t) edges have failed. We then union bound over all possible choices of such t edges. 

Before proceeding further, we define an object which we call the 'trellis', for lack of a better word. Associated to 
any undirected graph Q = (V, 6) represented by the adjacency matrix A, we define an infinite trellis T{Q) = (V7-, St) 
as follows. Associated to each node v in V, there are countably infinitely many copies {fc}fc>o m V7-. Let / denote a 
|V| x |V| identity matrix. Then the nodes V7- and edges St of T(G) are given by 

Vr=UUW (48a) 

veVk>0 

S T = {{v T ,u T .) I \t-t'\ = 1,04 + 7)^ = 1} (48b) 

The edges in St are all undirected, i.e., (^0,^1) and (vi,uo) are treated as a single edge. The trellis for an example 
network is given in Fig [2] 



Definition 2 (time -like): Any sequence of edges (or a path), St, in the trellis T(G) of the type 

;^,n^- i )),(n^,e 2 2 )),...,(4 i) ,4 0) )} 



will be called 'time-like' ending in node vt 

An edge {uf\u^Z-P) G £t is said to be erased if there was an erasure on the edge (u^ T \ u^ 7 " -1 )) G £ in round r. 
The time-like sequence St is said to have £ erasures if I of the t edges in <St were erased. We are now ready to state 



the key Lemma from which the proof of Theorem 7.1 follows easily. 



Lemma 1.2: If after t rounds of communication, node v has performed n v (t) iterations of (17 1, then there exists a 



time-like sequence of t edges ending in node vt that have at least t — n v (t) erasures among them. 



We will first prove Theorem 7.1 using Lemma 1.2 Suppose after t communication rounds, node v performed Rt 
iterations of ( 17 ), for some R < 1 — p. Recall that the probability of an erasure is p. Then there must be a time like 
sequence of t edges with at least (1 — R)t erasures, the probability of which is approximately 2~ tD ( 1 ~ R ' p \ where 
D{q,p) = q \og(q/p) + (1 — q)log(l — q/1 —p). Now there are at most (A + 1)* choices of such time-like sequences. 
Then, doing a union bound over all these sequences, we get 

P Rjt < N(A + i)*2-* D ( 1 -- R >f) (49) 



where Put is the probability that the network performed Rt or fewer iterations of (17i in t rounds and N is the 



number of nodes in the network. This is the claim in Theorem |7.1| We will now prove the Lemma. 



Proof: [Proof of Lemma 1.2 1 



For ease of presentation, we will introduce the following notation in the rest of the proof. 

a) we will refer to any time-like sequence of t edges ending in v T that has r — n v {r) or more erasures as a "witness" 
at v T . 

b) We will call a node u £ M v a "bottleneck" for node v in round t iff n vu (t-l)=n v (t-l)-l, I.6., Tl vu (f-i) = 
min u i £j\f v n vu i (t - 1). 

The Lemma claims that there is a witness at vt for all v G V and t > 0. We will prove this by induction. The 
hypothesis is clearly true for t = 0. Suppose it is true for all nodes dGV and all r < t — 1. Recall that we say that 
round t at node v is wasted only if n v {t — 1) = n v (t). There are two broad cases, round t gets wasted at node v or 
it does not. 

1) Suppose round t is not wasted, i.e., n v (t) = n v (t — 1) + 1. Then by the induction hypothesis, there is a witness at 
Vt-i- Appending the edge (vt-i,v t ) to this witness gives us a witness for v t . 

2) It remains to consider the case where round t gets wasted at node v, i.e., n v (t) = n v {t — 1). 



We will divide case 2) above into two sub-cases: a) 3 a u G A/"„ s.t n u (t — 1) = n v (t — 1) — 1 and b) such a neighbor 
does not exist. 

a) If there is a neighbor it G AC such that n u (t — 1) = n„(i — 1) — 1, then the witness for v t is obtained by appending 
the edge (vt,u t -i) to the witness at u t -\. 

b) Here n u (t — 1) > n v (t — 1) for all u G A/"«. Since \n u (r) — n v (r)\ < 1 for any r, we can partition the neighbors 
of v into two classes F = {«£ A/^ | n u (i — 1) = n„(i — 1)} and Z = {u G A4 | n u (t — 1) = n„(i — 1) + 1}. 
Furthermore, let B = {u G A/"„ | n TO (i — 1) = n v (t — 1) — 1} denote the bottlenecks for v in round t. 

We will further divide case b) above into two sub-cases: i) B n Z = and ii) £? n Z ^ 

i) i? n Z = 0, i.e., there are no bottlenecks in the set of neighbors Z. Observe that a bottleneck neighbor will not 
send a wait w. Also for any u G B n K, n« u (t — 1) = n v (t — 1) — 1 = n u (t — 1) — 1. So, the data transmitted 
by node tt to node v in round i is ^, i.e., iteration n u (t) of ( fT7| ). Since round t at node w got wasted, at 
least one of the edges to a bottleneck neighbor must have been erased in round t. Otherwise, node v would have 



been able to compute a new iteration of ( [17) and the round would not have been wasted. Suppose the erasure 
happened on edge (v, u) for some u G B n Y. Then appending edge (v t , Ut-i) to the witness at u t -\ will give 
us the witness at vp 

ii) BnZ 0, i.e., there is a neighbor u G BnZ such that n u (t — l) = n v (t— 1) + 1 and n vu (t — l) = n v (t — 1) — 1 = 
n u (t — 1) — 2. Furthermore, there must be a neighbor u£5nZ whose transmission to v in round t must have 
been erased (else there must be an edge to B n Y which was erased and we revert back to case i)). Note that 
n u (t — 2) > n v (t — 1). It follows from Lemma [TT] that node u must have transmitted iteration n v (t — 1) in round 
t — 1 as well as round t and both were erased since n vu {t) = n vu (t — 1) = n v (t — 1) — 1. Since this erasure 
model considers symmetric erasures, the transmission from v to u in round t — 2 is also erased. Appending the 
edges (vt,ut-i) and (ut-i,vt-2) to the witness at vt-2 gives us the witness for vt. 



This completes the proof of Lemma 1.2 



D. Proof of Theorem 7.3 



We will begin the proof with three preliminary results before moving to the main argument. Recall that an 
(R, ft)— anytime reliable code is one that guarantees P (b T \t ^ K^j < 2 _/3 ('~ T+1 ). For such a code that is linear, 
we can say the following. 

Lemma 1.3: Suppose {6j}j>o are encoded and decoded using a causal linear (R, /?)— anytime reliable code. Consider 
the following events, Y(t[,ti): t[ = 1 + argmax^{6£i ri = be} and Y(t2,T2): t' 2 = 1 + argmax^{6^| T2 = be}, i.e., 
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The trellis Tg 

(a) An example network (b) Trellis associated to the network in (a) 

Fig. 2. This depicts the trellis associated to a network of three nodes connected in a straight line. The thick lines represent edges. 

Y(r[^Ti) is the event that at decoding instant Tj, the position of the earliest error is at r[ for i = 1,2. Furthermore, 
suppose that the intervals [t{,ti] and [t"2,T2] are disjoint. Then we have 



p {Y(T[, n ) n y{t' 2 ,t 2 )) < 2-^-^ +1 \+\ T -- T ^ 1 



l) 



(50) 



The probability above is only over the randomness of the channel. 

Proof: Without loss of generality, assume that t' 2 > t\. Due to linearity, we can assume without losing generality 
that the input hi = for i > 0. Let Ei denote the portion of the erasure pattern introduced by the channel during the 
interval [r^Tj] that resulted in the event Y(r-,Ti). Then, we claim that P(Ei) < 2 _/3 l ri ~' r *' +1 L This follows from the 
simple observation that if the encoder input in the first Tj — t- + 1 instants is all zero and the corresponding channel 
erasure pattern is Ei, then Y(t[, Ti) implies that at the decoding instant Tj —t[, the earliest error would have happened 
at time 0, the probability of which is at most 2~^l r! ~ r! +1 L 

Since the intervals [r{, t\] and [t 2 , t 2 ] are disjoint, the erasure patterns E\ and E 2 correspond to independent channel 
uses. So we have 

P {Y{t[,t x ) n Y{t' 2 , t-0) < P(E 1 ,E 2 ) = P(E 1 )P(E 2 ) 

The result now follows. ■ 
For ease of presentation, we introduce the following definition 



Definition 3 (Error Interval): With respect to the notation in Lemma 1.3 we refer to the interval [t-, Tj\ as the error 



at time ts. 



Before proceeding with the rest of the proof, we will recall a Lemma from [15] and state it here for easy reference. 



Lemma 1.4 (Lemma 7, [15]): In any finite set of intervals on the real line whose union J is of total length s there 
is a subset of disjoint intervals whose union is of total length at least s/2 



We will now state a version of Lemma 1.3 when the error intervals are not necessarily disjoint. 

Lemma 1.5: If {6j}j>o are encoded and decoded using a causal linear (R, j3)— anytime reliable code, then 



Proof: The proof follows directly from Lemma 1.3 and Lemma 1.4 



We use an argument very similar to the one used in proving Theorem 7.1 We will define a trellis T{G) exactly the 



same way we defined T(Q) except that the edges Sf are now directed and they point forward in time, i.e., downwards 



w.r.t to the Fig 2(b) In other words, for neighbors (u,v) G V, the edge (vt,Ut-i) is directed from node Ut-\ to node 



vt and represents the transmission from u to v in round t. 

Recall the definition of a time-like sequence of edges, St, from Definition [2j Let 



Let B T be the error interval at decoding instant r on the edge node (u 
error interval on the edge (u r T \ u?Z\ ) G Then we define |St| as follows 

\S t \ = ^2 \ B vu\, where 

B vu = B T 

r:(n( T ),M( T - 1 )) = (i),«) 



G <S. We alternately call 73 T the 

(51) 
(52) 



This definition is motivated by the fact that the packet erasure events during an error interval on a given edge, 
say (v,u) G £, sue independent of those in an error interval on a different edge (v',u') ^ (v,u) in any round of 
communication. So, intuitively |<Sf| captures the number of independent "bad" channel realizations seen by the edges 
in St- In what follows, we will show a connection between the number of wasted communication rounds at the node 
u l and the number \SA. 



A witness at node vt is a time like sequence of edges St such that \S\ > t—n v (t). In Lemma 1.6 we will demonstrate 



a witness for vt for all v G V and t > 0. The technique is very similar to the proof of Lemma [L2| and hence we will only 
provide a sketch of the proof. After that we will use Lemma 1.5 to prove that P{t-n v (t) >m)< (A + l) t Q2" m ' 3 / 2 
for any v G V. 



Lemma 1.6: If after t rounds of communication, node v has performed n v (t) iterations of ( [17] ), then there exists a 
time-like sequence, St of t edges in Sj- ending in node vt with \St\ > t — n v (t) 



Proof: The proof is obtained by repeating the same argument as in the proof of Lemma 1.2 with the word 
'erasure' replaced with the word 'tree code error'. The only case that needs a little bit of clarification is case 2-b-ii, 
i.e., round t is wasted at node v and B D Z ^ 0, where B and Z retain the same meaning as before. In this case, 
like before, there is a neighbor u G Af v such that n vu (t) = n u (t — 1) — 2. From Algorithm [2j it is clear that node 
the information (t_x)_i was encoded and transmitted by node u to node v in round t — 1 or before. Therefore, 
the error interval on the edge (vt,u t -i) G Sj- contains the interval [t — l,t]. Let the witness at node Ut-i be St-\ fU . 
Append the edge (vt,ut~i) to St—i )U to get a new time-like sequence which we call St )V - We claim that St >v is a 
witness at vt. This proof of this claim follows from the following observations 



1) When applying Lemma 1.5 we only to care about error intervals on the same edge at different times 

2) The edge (v, u) appears in the time-like sequence St )V for round t and hence, it can possibly appear again only 
in St )V in round t — 2 or earlier. So, the length of the union of the error intervals on the edge (v T , u T -i) G St-i >u 
increases by at least 2 with the addition of the edge (vt, ut-\). Hence we have 

|«5t,„| > |«St-i, u | + 2 > t - 1 - n u (t -l)=t — n v (t) 



This completes the proof of Lemma |1.6| 
Putting together Lemma |1.6| and Lemma |1.5| we have 



P(t - n v {t) > m) < (A + 1)*( 1 ^-P' 1 " 2 



in 



The result now follows trivially. 



E. Proof of Theorem 7.2 



The bound (1 — is intuitively motivated by the following observation, in a given round of communication, 
(1— is the probability that none of the edges are erased. As a result one would expect the fraction of communication 
rounds in which nodes can perform an iteration of ([17]) to be approximately (1 — The above observation alone 



would not render a proof because successful communication could also mean that a node received only 'waits' from its 



neighbors and hence could not compute an iteration of ( 17). The proof idea is simple but conveying it requires some 



setup. Let Wul denote the event where node v transmits a 'wait' to node u in round t. We introduce the following 
definition 

Definition 4: Consider nodes v, u, v! such that u G M v and u' G M u - Also suppose that node v transmits a 'wait' 
to node u in round r and node u transmits a 'wait' to node v! in round r + 1, i.e., events Wuv and W^,^ happen. 

(t) (t+1) 

Then Wu V is said to have caused W u , u if both the following conditions hold 



(a) n u [r - 1) = 1 + n uv [r - 1) 

(b) 7v u (t) = n, u (r) 

To understand the definition, observe that condition (a) implies that node v is a bottleneck node for node u in round r 
and condition (b) implies that node v! already knows n u (r) after round r. Node u could not perform a new iteration 
in round r since it received a 'wait' from a bottleneck node (in this case v) and hence sent a 'wait' to node u'. So, 
it is natural to blame Wuv f° r W^T^ 1 ' '. Note that Definition [4] is further justified by the observation that a 'wait' in 
round r will either have an effect in round r + 1 or will never. Also note that Definition [4] can be extended to more 
than two waits by having conditions (a) and (b) hold for every pair of successive 'wait' events. 

With that, we are now ready to state the main Lemma. The Lemma essentially implies that 'waits' do not loop in 
the network. In other words, if in round r a node v transmits a 'wait', then this 'wait' will not cause the same node 
v to transmit another 'wait' in a future round r' > r. 

Lemma 1.7 ('Waits' do not loop): Consider the sequence of events {Wu~^luPYi=i sucn that W^^ U P is caused by 
Wuiit-i f° r all 2 < i < £. Then the nodes {«i}| =1 are all distinct. 

Proof: Node u\ sent a 'wait' to node U2 in round r implies that n Ul (r — 1) = n U2Ul (T — 1). Furthermore, since 
Wusul^ is caused by Wfi} Ul , conditions (a) and (b) in Definition [4] apply. In particular, condition (a) together with the 
first observation gives n„ 2 (r — 1) = n Ul (r — 1) + 1. Since node U2 could not perform a new iteration of (17 1, we 
have n U2 (T) = n U2 (r — 1) = n Ml (r — 1) + 1. Repeating this argument for the remaining nodes, we get 

n Ui+1 (r + i-l) = n Ui (n-2), Vl<i<l (53) 

Now suppose the nodes {ui}f =1 are not all distinct. In particular, suppose u$ = u\. Then from ( |53) ), we have 
n Ul (r + £ — 2) = n Ue (t + £ — 2) = £ — 1 + n Ul (r — 1) which is not possible since n Ul (.) can increment by atmost 1 
in each round and n Ul (r) = n Ul (r — 1). 

One will similarly arrive at a contradiction if any other node repeats in {ui} e i=l . ■ 



The implication of Lemma 1.7 is clear. If a node v sends a 'wait' in round r to any of its neighbors, then this 'wait' 



will not by itself stop node v from performing an iteration of ( [17] ) in a future round. 

We are now ready to provide the main argument. Let d(v, u) denote the length of the shortest path from node u to 
node v. So, if v G M u , then d(v, u) = 1 and d(v, v) = 0. Let the diameter of the graph be 5, i.e., 5 = max u „ g y d(v, u). 
And for an edge e uu > = (u, u') G £, we define 

d(v, e uu ') = min{d(u, u),d(v, u')} 



Let £® = {e £ £ \ d(v, e) = i}. In view of Lemma |l.7[ it is not difficult to see that an erasure on an edge in in 



round r will have an effect (if any) at node v only in round r + i. Let A^ T denote the event that there is an erasure 
on an edge in £y in round r. Then for r > 6, it is easy to see that C\ i=0 A'l T _ i implies that the round r at node v is 



not wasted, i.e., node v can compute an iteration of < \T7) . In other words 

\\£\ 



P(n v (r) = n v (r - 1)) < 1 - P (f) Af^ = 1 - (1 -p)\- 
Due to the erasure model, note that the even A^ T is independent of Ay ^ for (i,r) / {i\r'). Let 

X t = l[n ll (r)=n„(r-l)] 
Y T = l[uf =0 A i , T _ i ] 

Then from the above argument X T = 1 implies Y T = 1 and {IV} are independent Bernoulli random variables. Note 
that P(Y T = 1) < 1 - (1 -p)l fi l. Let R' = then we have 

P(t - n v (t) = m) = p(j2X T = m ) j<P Ry r > m\ < 2 - tD ^- R '^ 1 -^ = 2 -^(«'^-f) |£| ) 

The last inequality follows from a standard Chernoff bounding technique and is true whenever B! < (1 — Union 
bounding over all nodes v € V, we have 

This completes the proof. 
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